One main goal of open science is to bring transparency into all stages of the research process, promoting collaboration and increasing access to data and scientific findings (Hampton et al, 2015). Proponents of open science claim that open practices will “help to further democratize science, diversifying perspective and knowledge by promoting broader access for scientists in developing countries and at under-resourced institutions, fostering citizen science” (Hampton et al, 2015). Currently, open science is not as widely accepted as some would hope and this leads to problems of inclusivity and equity in the push for fully open science. The two biggest barriers to open practices for early career and minority scientists are financial and social (Bahlai et al., 2018). In some cases, scientists are expected to use personal funds to pay open access fees, which can be prohibitive for scientists with lower salaries (Bahlai et al., 2018). Socially, if early career scientists follow completely open-science practices, they will be published in journals with lower impact factors which is likely to hurt their chances of finding a job (Bahlai et al., 2018). Although many open science supporters are advocates of moving away from the impact factor metric, it’s important to note that this metric is still widely used to evaluate scientists today (McKiernan et al., 2016). It is unfair to expect early-career and underrepresented scientists to take the risks associated with open-science practices given that these practices could negatively impact their chances of landing a job. Furthermore, imposing transparency in areas such as the review process, can disproportionately hurt underrepresented scientists who have benefited from these practices that reduce bias in the review process (Bahlai et al., 2018).
In my work, I have started using open practices through Google Documents, Google Spreadsheets, and GitHub. Professionally, using open practices has been nerve-racking for me because it can easily expose mistakes. I have come to understand and appreciate that collaborative work is usually the most successful and offers more learning opportunities. Using GitHub in an academic setting has helped make me more comfortable with making my work flow open and I have started using it for assignments. This experience with GitHub has been invaluable as I have started working with the National Center for Ecological Analysis and Synthesis (NCEAS) as a data science intern. NCEAS uses GitHub and promotes collaboration among interns to get the work done. Despite all of the opportunities I have to use open practices, I still find myself limited by personal fears of humiliation from mistakes in my work. This has led me to start my work in a closed space until it feels perfected and only then move the work to an open platform such as GitHub. Limitations that I see in this semi-open process is the lack of transparency around the work and thought process that led to the final product. It is often valuable to see the different approaches that lead to the final product. I am working to overcome these limitations by encouraging myself to work more openly from the beginning of projects and by attending trainings promoting open science practices. I believe that education and practice is the only way to get over fears of failure and humiliation. Most importantly, I believe it is crucial for proponents of open science to be aware of and understand the challenges that come with open science, specifically for underrepresented students and early-career scientists. Supporting small steps taken by scientists practicing open science will be critical in ensuring inclusivity in open science (McKiernan et al., 2016).
Sources
Bahlai, C. et al (2018). Open Science Isn’t Always Open to All Scientists. American Scientist. https://www.americanscientist.org/article/open-science-isnt-always-open-to-all-scientists
Hampton, S. E., S. S. Anderson, S. C. Bagby, C. Gries, X. Han, E. M. Hart, M. B. Jones, W. C. Lenhardt, A. MacDonald, W. K. Michener, J. Mudge, A. Pourmokhtarian, M. P. Schildhauer, K. H. Woo, and N. Zimmerman. 2015. The Tao of open science for ecology. Ecosphere 6(7):120. http://dx.doi.org/10.1890/ES14-00402.1
McKiernan, E.C. et al. (2016). How open science helps researchers succeed. eLife. 2016; 5: e16800. doi: 10.7554/eLife.16800. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4973366/
# Get data
truckee <- read_csv("clean_truckee_flow.csv")
# Convert to ts data
truckee_ts <- ts(truckee$mean_va, frequency = 12, start = c(2000,1))
# plot(truckee_ts)
# Decompose to explore data further
truckee_dc <- decompose(truckee_ts)
plot(truckee_dc)
Although there are some high values and low values in the data, there do not seem to be any outliers (which could have a disproportionate effect on the time series model). There seems to be a slight downward trend in the data and it looks non-stationary and additive. We see a seasonal pattern that repeats about every 12 months and a slight cyclical trend about every five years (the seasonal component is on the same scale as the orinigal data).
# Holt Winters exponential smoothing
truckee_hw <- HoltWinters(truckee_ts)
# plot(truckee_hw)
# Forecast Holt Winters
truckee_forecast <- forecast(truckee_hw, h = 60)
# plot(truckee_forecast)
# Autoregressive integrated moving average (ARIMA) for comparison
# estimate pdq
truckee_pdq <- auto.arima(truckee_ts) # [2,1,1,][0,0,2]
# fit the ARIMA model
truckee_arima <- arima(truckee_ts, order = c(2,1,1), seasonal = list(order = c(0,0,2)))
# evaluate residuals
# par(mfrow = c(1,2))
# hist(truckee_arima$residuals)
# qqnorm(truckee_arima$residuals) # looks normal
# forecast ARIMA
forecast_truckee <- forecast(truckee_arima, h = 60)
# plot(forecast_truckee)
# Graph of Holt Winters
plot(truckee_forecast,
xlab = "Time",
ylab = "Truckee River Flows (cubic feet per second)")
par(mfrow = c(1,2))
hist(truckee_forecast$residuals)
qqnorm(truckee_forecast$residuals) # Looks relatively normally distributed.
# Read in nps data
nps_ca <- read_sf(dsn = ".", layer = "nps_boundary") %>%
filter(STATE == "CA",
UNIT_TYPE == "National Park") %>%
dplyr::select(UNIT_NAME) %>%
rename(Name = UNIT_NAME)
st_crs(nps_ca) = 4326
# Read in CA county data
ca_counties <- read_sf(dsn = ".", layer = "california_county_shape_file")
st_crs(ca_counties) = 4326
# Map it!
map_nps_ca <- tm_shape(nps_ca) +
tm_fill("Name", palette = "Dark2", alpha = 0.5) +
tm_shape(ca_counties) +
tm_basemap("Esri.WorldPhysical") +
tm_legend(show = FALSE) +
tm_borders()
tmap_mode("view")
map_nps_ca
# Read in data and initial wrangling
lizard <- read_csv("clean_lizard.csv") %>%
filter(site == "CALI") %>%
replace_with_na_all(condition = ~.x == ".") %>%
filter(sex == "F" | sex == "M") %>%
drop_na(weight)
# Coerce weights to be numeric
lizard$weight <- as.numeric(lizard$weight)
# Visualize data
# ggplot(lizard, aes(x = weight, fill = sex)) +
# geom_histogram(alpha = 0.5, position = "identity")
# Both male and female weights are skewed. However, the central limit theorem applies here - distribution of means will be normally distributed (we have more than 30 observations in each group)
# Vector of adult male lizard weights
male <- lizard %>%
filter(sex == "M") %>%
pull(weight)
# Vector of adult female lizard weights
female <- lizard %>%
filter(sex == "F") %>%
pull(weight)
# F test for equal variance
# H0: Ratio of variances (Variance A/Variance B) = 1
# HA: Ratio of variances is NOT = 1
lizard_f <- var.test(female, male)
# p-value = 0.29, retain the null hypothesis of equal variance
# T sample t-test
# H0: Mean weights of adult female and male lizards are equal
# HA: Mean weights of adult female and male lizards are NOT equal
lizard_t <- t.test(female, male, var.equal = TRUE)
# p-value = 0.43, retain the null hypothesis. For lizards trapped at the CALI site, mean weights of female and male adult lizards do not differ significantly.
# Mean weights and Cohen's d - effect size
male_lizard <- lizard %>%
dplyr::select(sex, weight) %>%
filter(sex == "M")
female_lizard <- lizard %>%
dplyr::select(sex, weight) %>%
filter(sex == "F")
male_mean <- mean(male_lizard$weight) # 4.96
male_sd <- sd(male_lizard$weight) # 5.68
female_mean <- mean(female_lizard$weight) # 6.50
female_sd <- sd(female_lizard$weight) #5.83
# Mean weight difference
mean_diff <- female_mean - male_mean # 0.86
effect_size <- cohen.d(weight ~ sex, data = lizard)
# 0.14 neglibile
Mean adult female lizard weights (6.50 g ± 5.83 g, n = 75) and adult male lizard weights (4.96 g ± 5.68 g, n = 57) for lizards trapped at the Caliche creosotebush site did not differ significantly [t(130) = 0.8, p = 0.427, \(\alpha\) = 0.05]. The effect size is negligible (Cohen’s d = 0.14) and the difference in mean weight between female and male lizards is 0.86 grams.
lizard_tails <- lizard %>%
dplyr::select(sex, tail) %>%
drop_na(tail)
# Chi-square
# H0: There is no significant difference between proportion of females and males with broken tails
# HA: There is a significant difference between proportion of females and males with broken tails
chi_tail <- lizard_tails %>%
count(sex, tail) %>%
spread(tail, n) %>%
dplyr::select(-sex)
rownames(chi_tail) <- c("F","M")
#Actual proportions:
tail_prop <- prop.table(as.matrix(chi_tail), 1)
tail_x2 <- chisq.test(chi_tail) # Retain the null, there is no significant difference in proportions of females and males with broken tails at the CALI site x2 = 0.163, df = 1, p-value = 0.69
There is no significant difference in the proportion of observed adult female (0.23) and male lizards (0.18) with broken tails at the Caliche creosotebush site (\(\chi^2\){1} = 0.16, p = 0.69).